Toward a Universal Platform for Integrating Embodied Conversational Agent Components
نویسندگان
چکیده
Embodied Conversational Agents (ECAs) are computer generated life-like characters that interact with human users in face-to-face conversations. To achieve natural multi-modal conversations, ECA systems are very sophisticated and require many building assemblies and thus are difficult for individual research groups to develop. This paper proposes a generic architecture, the Universal ECA Framework, which is currently under development and includes a blackboard-based platform, a high-level protocol to integrate general purpose ECA components and ease ECA system prototyping. 1. The Essential Components of Embodied Conversational Agents and the Issues to Integrate Them Embodied Conversational Agents (ECAs) are computer generated life-like characters that interact with human users in face-to-face conversations. To achieve natural communications with human users, many software or hardware assemblies are required in an ECA system. By their functionalities in the information flow of the interactions with human users, they can be divided into four categories: ECA Assemblies in the Input Phase. Non-verbal behaviors are the indispensable counterpart of verbal information in human conversations and thus embodied agents have to possess the capabilities of both of them. In addition to capturing natural language speech, non-verbal behaviors such as head movements, gaze directions, hand gestures, facial expressions, and emotional conditions are acquired by various types of sensors or visual methods in ECA researches. Further, input understanding tasks such as speech and gesture recognition are also required to be done in this phase. ECA Assemblies in the Deliberate Phase. This is the central part of an intelligent agent to determine its behaviors in responding to the inputs from the outside environment. An inference engine with a background knowledge base and a dialogue manager are required for conducting a discourse plan to achieve the ECA’s conversational goal according to the agent’s internal mental state. Talking to a conversational agent without emotions and facial expressions is weird and will be easily satiated while being like a human in the real world, personality, emotion, culture, and social role models are incorporated into ECAs to improve their believability. ECA Assemblies in the Output Phase. Verbal output or natural language synthesis is generally done by a Text-To-Speech (TTS) engine to speak out the text output from the dialogue manager. Spontaneous non-verbal behavior outputs such as facial expressions, eye blinks, spontaneous hand gestures, and body vibrations are generated randomly or depending on the syntactical information of accompanied utterance by using the result of statistical analysis like CAST [5]. At last, a 2D/3D character animation player that renders the virtual character body and probably the virtual environment where the character resides on the screen is necessary. A Platform for Integrating ECA Components. To integrate all the various assemblies of an ECA system described above, a platform or framework that seamlessly integrates them is a critical part. This platform has to transport all the sensor data streams, decisions, and command messages between all the components. It has been proposed that there are four essential requirements in the ECA component integration issue [4, 6]. First, the platform has to keep all of output modalities to be consistent with the agent’s internal mental state. Second, all the verbal and non-verbal outputs are required to be synchronized. Third, ECAs have to be able to response to their human users in real-time. Fourth, the support for two ways of the information flow, “pull data from a component” and “push data to a component” are required in ECAs. 2. Universal Embodied Conversational Agent Framework ECA systems are so sophisticated and their functions actually involve multiple research disciplines in very broad range such that virtually no single research group can cover all aspects of a full ECA system. Moreover, the software developed from individual research result is usually not meant to cooperate with others. There is a number of outstanding ECA systems that have been proposed previously, however, their architectures are ad hoc designed [2] and are not for a general purpose use. Therefore, if there is a common and generic backbone framework that connects a set of general-purpose reusable and modulized ECA components which communicate with each other in a well-defined and common protocol, the rapid building and prototyping of ECA systems become possible, and the redundant efforts and resource uses of ECA researches can be prevented. This work proposes such an architecture that eases the development of ECA systems for general purposes. In our current design, it contains the following three parts, a general purpose platform (Universal ECA Platform) which is composed by a set of server programs for mediating and transporting data stream and command messages among stand-alone ECA software modules, a specification of a high-level protocol based on XML messages (UECAML) that are used in the communication between a standardized set of ECA components, and an application programming interface (UECA API) for easy development of the wrappers for the ECA software modules. These basic concepts are shown in Fig. 1. We use blackboard model as the backbone platform and OpenAIR [6] as the lowlevel routing and message passing protocol for the following reasons: − Distributed architecture and XML absorb the differences of operating systems and programming languages of components and distribute the computation complexity. − Unlike a long pipelined architecture, the single-layer topology provides the possibility to support reflexive behaviors that bypass the deliberation of the agent. − The weak inter-connectivity of the components allows the online switching of components and thus makes online upgrading and maintaining of components. − Components with different levels of complexity can be integrated into the ECA system as long as they understand and generate the same message types and the system can still work even some components are absent. − Logically isolated multiple blackboards can distribute information traffic that is concentrated on only one blackboard in traditional systems. Based on this framework, we are specifying an XML based high-level protocol for the communications between ECA components. Every XML message belongs to a message type, for example, “input.speech.text”, “output.body.gesture”, etc. Each message type has a specified set of elements and attributes, for example, “intensity”, “time_interval”, “start_time”, etc. Each component subscribes its interested message type(s), read them from the blackboard when they are published by another component, generates its output and publishes messages in other types to the blackboard. In the current stage, we are focusing on the specification on input and output phases and categorized the message types in the procedure of the I/O phases into an abstract hierarchy having three layers in the blackboard according to their abstractness. This basic idea is depicted in Fig. 2(a) and described below. Low-level Parameter Layer in Input Phase: To absorb the possible variance even for the same modality in the lowest-level raw sensors’ data, the sensor data handling components interpret raw data into low-level parameterized representations, and then write them into the blackboard. For example, rather than the raw wave data from the voice capture subsystem, the user’s voice is interpreted into a recognized text stream by a speech recognition component, rather than the absolute current positions and angles of a sensor of the motion capture system, the numbers are transformed into angles of the joints of a human body. As a result, the total output of the components in this stage is a parameterized text representation of movements of human users includes the angles of body joints, eye gaze directions, facial expression primitives, Black
منابع مشابه
Towards an Expressive Embodied Conversational Agent Utilizing Ethnicity and Gender to Augment Solution Focused Therapy
In this article, we present ongoing research, EMO, an affective embodied conversational agent platform, aimed at depicting multi-ethnic, multi-modal communication patterns in a credible manner. We employ the methodology of integrating counseling concepts early in the design to effectively target a specific domain. The system is geared to augment solution focused therapy. We present a prototype ...
متن کاملAffective Learning: Empathetic Embodied Conversational Agents to Modulate Brain Oscillations
Integrating emotional feedback to educational systems has become one of the main concerns of the affective learning research community. This paper provides evidence that Embodied Conversational Agents (ECAs) could be effectively used as emotional feedback to improve brainwave activity towards learning. Further research, integrating ECAs into tutoring systems is essential to confirm these result...
متن کاملDTask and LiteBody: Open Source, Standards-Based Tools for Building Web-Deployed Embodied Conversational Agents
Two tools for developing embodied conversational agents and deploying them over the world-wide web to standard web browsers are presented. DTask is a hierarchical task decomposition-based dialogue planner, based on the CEA-2018 task description language standard. LiteBody is an extensible, web-based BML renderer that runs in most contemporary web browsers with no additional software and provide...
متن کاملXNAgent: Authoring Embodied Conversational Agents for Tutor-User Interfaces
Embodied conversational agents are virtual characters that engage users in conversation with appropriate speech, gesture, and facial expression. The high cost of developing embodied conversational agents has led to a recent increase in open source agent platforms. In this paper, we present XNAgent, an open source platform for embodied conversational agents based on the XNA Framework. By leverag...
متن کاملA platform for Embodied Conversational Agents based on Distributed Logic Programming
In this paper we will outline the requirements for a software platform supporting embodied conversational agents. These requirements encompass computational concerns as well as presentation facilities, providing a suitably rich environment for applications deploying conversational agents. We will then propose a platform based on the distributed logic programming language DLP and X3D, the extens...
متن کامل